Phrase based statistical machine translation: models, search, raining

نویسنده

  • Richard Zens
چکیده

Machine translation is the task of automatically translating a text from one natural language into another. In this work, we describe and analyze the phrase-based approach to statistical machine translation. In any statistical approach to machine translation, we have to address three problems: the modeling problem, i. e. how to structure the dependencies of source and target language sentences; the search problem, i. e. how to find the best translation candidate among all possible target language sentences; the training problem, i. e. how to estimate the free parameters of the model from the training data. We will present improved alignment and translation models. We will present alignment models which improve the alignment quality significantly. We describe several phrase translation models and analyze their contribution to the overall translation quality. We formulate the search problem for phrase-based statistical machine translation and present different search algorithm in detail. We analyze the search and show that it is important to focus on alternative reorderings, whereas on the other hand, already a small number of lexical alternatives are sufficient to achieve good translation quality. The reordering problem in machine translation is difficult for two reasons: first, it is computationally expensive to explore all possible permutations; second, it is hard to select a good permutation. We compare different reordering constraints to solve this problem efficiently and introduce a lexicalized reordering model to find better reorderings. We investigate alternative training criteria for phrase-based statistical machine translation. In this context, we generalize the known word posterior probabilities to n-gram posterior probabilities. The resulting machine translation system achieves state-of-the-art performance on the large scale Chinese-English NIST task. Furthermore, the system was ranked first in the official TC-Star evaluations in 2005, 2006 and 2007 for the Chinese-English broadcast news speech translation task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search

In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level transl...

متن کامل

A Detailed Analysis of Phrase-based and Syntax-based Machine Translation: The Search for Systematic Differences

This paper describes a range of automatic and manual comparisons of phrase-based and syntax-based statistical machine translation methods applied to English-German and English-French translation of user-generated content. The syntax-based methods underperform the phrase-based models and the relaxation of syntactic constraints to broaden translation rule coverage means that these models do not n...

متن کامل

Document-Wide Decoding for Phrase-Based Statistical Machine Translation

Independence between sentences is an assumption deeply entrenched in the models and algorithms used for statistical machine translation (SMT), particularly in the popular dynamic programming beam search decoding algorithm. This restriction is an obstacle to research on more sophisticated discourse-level models for SMT. We propose a stochastic local search decoding method for phrase-based SMT, w...

متن کامل

NUT-NTT statistical machine translation system for IWSLT 2005

In this paper, we present a novel distortion model for phrase-based statistical machine translation. Unlike the previous phrase distortion models whose role is to simply penalize nonmonotonic alignments[1, 2], the new model assigns the probability of relative position between two source language phrases aligned to the two adjacent target language phrases. The phrase translation probabilities an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008